An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

نویسندگان

  • Jiaming Song
  • Yuhuai Wu
چکیده

Deep reinforcement learning methods have shown tremendous success in a large variety tasks, such as Go [Silver et al., 2016], Atari [Mnih et al., 2013], and continuous control [Lillicrap et al., 2015, Schulman et al., 2015]. Policy gradient methods [Williams, 1992] is an important family of methods in model-free reinforcement learning, and the current state-of-the-art policy gradient methods are Proximal Policy Optimization ( Schulman et al. [2017]) and ACKTR [Wu et al., 2017]. The two methods, however, take different approaches to better sample efficiency: PPO considers a particular “clipping” objective that mimics a trust-region, whereas ACKTR considers approximated natural gradients that balances speed and optimization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronec...

متن کامل

A Kronecker-factored approximate Fisher matrix for convolution layers

Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present K...

متن کامل

Optimization of thermal curing cycle for a large epoxy model

Heat generation in an exothermic reaction during the curing process and low thermal conductivity of the epoxy resin produces high peak temperature and temperature gradients which result in internal and residual stresses, especially in large epoxy samples. In this paper, an optimization algorithm was developed and applied to predict the thermal cure cycle to minimize the temperature peak and the...

متن کامل

Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control

In this work we introduce the application of black-box quantum control as an interesting reinforcement learning problem to the machine learning community. We analyze the structure of the reinforcement learning problems arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a general method to solving...

متن کامل

Optimization and sound absorption modeling in Yucca Gloriosa natural fiber composite

Introduction: Nowadays, the acoustic behavior analysis of natural fibers composites has received increasing attention by researchers. In this regard, the present study aimed to optimize and model the sound absorption behavior of composites made of Yucca Gloriosa (YG) fiber via using a mathematical modeling approach. Methodology: In this experimental cross-sectional study, in order to fabricate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.05566  شماره 

صفحات  -

تاریخ انتشار 2018